Implementation of a Customizable Fault Tolerance Framework

نویسندگان

  • I-Ling Yen
  • Iftikhar Ahmed
  • Ramanujam Jagannath
  • Sreeparna Kundu
چکیده

While there has been signiicant advances in fault tolerance research, the eeort has focused on the design of individual fault-tolerant systems or methodologies. Recently, some research has been initiated to develop fault tolerance paradigms that can be used to provide a spectrum of fault tolerance levels. In this paper, we present the design of a fault tolerance framework that can be used to support a wide spectrum of applications with various fault tolerance requirements, various criticality levels, and various system models. The framework is designed to be parameterizable so that the user can conngure it to obtain the desired features. Also, the framework is designed to be an oo-the-shelf component such that application programs can be integrated within it easily to obtain the fault-tolerant version of the application system. A specialized N-modular redundancy (SNMR) scheme has been developed to serve as the primary approach for achieving eecient and cost-eeective fault tolerance for the framework. In most cases, the SNMR scheme yields better performance and lower cost in providing fault tolerance as compared with conventional NMR schemes. It also enhances the scalability and customizability of the general replication method. This paper discusses the major concept of the SNMR framework and the main issues in the design and implementation of the framework, including an object-oriented overall system design and the interface protocol class hierarchy. The interface protocol class hierarchy provides a nice paradigm for the implementation of customizable, highly reusable, and easily extensible SNMR framework.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Customizable Fault Tolerance for Wide-Area Byzantine Replication

This paper presents a hierarchical replication architecture, tailored to systems that span multiple wide-area sites, that enables free substitution of the fault tolerance method used in each position of the hierarchy. This unique approach enables customization based on perceived risks, balancing performance and fault tolerance, by deploying either a Byzantine or a benign fault-tolerant protocol...

متن کامل

Design and Implementation of a Safe, Reflective Middleware Framework

With the rapid evolution of the global information infrastructure and ubiquitous computing environments, service providers will need to provide effective and adaptive resource management mechanisms that can serve concurrent applications in the presence of changing system conditions. Flexible, scalable and customizable middleware can be used as enabling technology for next generation systems tha...

متن کامل

A Lightweight Fault Tolerance Framework for Web Services 1

In this paper, we present the design and implementation of a lightweight fault tolerance framework for Web services. With our framework, a Web service can be rendered fault tolerant by replicating it across several nodes. A consensus-based algorithm is used to ensure total ordering of incoming application requests to the replicated Web service, and to ensure consistent membership view among the...

متن کامل

Novel Defect Terminolgy Beside Evaluation And Design Fault Tolerant Logic Gates In Quantum-Dot Cellular Automata

Quantum dot Cellular Automata (QCA) is one of the important nano-level technologies for implementation of both combinational and sequential systems. QCA have the potential to achieve low power dissipation and operate high speed at THZ frequencies. However large probability of occurrence fabrication defects in QCA, is a fundamental challenge to use this emerging technology. Because of these vari...

متن کامل

A Framework For Proactive Fault Tolerance12

Fault tolerance is a major concern to guarantee availability of critical services as well as application execution. Traditional approaches for fault tolerance include checkpoint/restart or duplication. However it is also possible to anticipate failures and proactively take action before failures occur in order to minimize failure impact on the system and application execution. This document pre...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998